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The novel coronavirus disease 2019 (COVID-19) current pandemic is a 
worldwide health emergency like no other. It is not the only COVID-19 
infection in infants, children, and adolescents that is causing concern among 
their families and professionals; there are also other serious issues that must 
be carefully detected and addressed. Major things are identified due to 
COVID-19, some elements are affecting children’s healthcare in direct or 
indirect ways, affecting them not just from a medical standpoint but also 
from social, psychological, economic, and educational perspectives. All 
these factors may have affected children’s mental development, particularly 
in rural settings. As Bangladesh faces a major challenge such as a lack of 
public mental health facilities, especially in rural areas. So, we discovered a 
method to predict the mental development condition of rural children that 
they are facing at this time of COVID-19 using machine learning 
technology. This research work can predict whether a rural child is mentally 


developed or mentally hampered in Bangladesh and this prediction gives 
nice feedback. 
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1. INTRODUCTION 

Coronavirus disease 2019 (COVID-19) is a deadly disease that causes many respiratory syndromes. 
On 31* December 2019, it was the first genesis in Wuhan city, Hubei Province of China [1], [2]. COVID-19 
is a fatal disease that comes from animals to humans [3]. COVID-19 pandemic has affected everyone in one 
form or the other. Since the first case of COVID-19 was reported in China has spread rapidly to the whole 
world. On March 11, 2020, the World Health Organization (WHO) declared COVID-19 as the global 
pandemic [4]. Bangladesh is also affected by this deadly virus. The first three known cases were reported on 
8 March 2020 by the country’s epidemiology institute, Institute of Epidemiology, Disease Control and 
Research (IEDCR) [5]. And the government of Bangladesh and private organizations have taken a few 
initiatives to fight against the pandemic spread of COVID-19 [6]. Since then, the pandemic has spread day by 
day over the whole nation. To protect the population, the government declared ‘lockdown’ throughout the 
nation from 23 March to 30 May and this lockdown increased later [7]. As a result, all educational 
institutions are also declared for vacation at this time. Children cannot cope with this situation because they 
face this pandemic for the first time. In normal situations, 15% of children were suffering from mental 
disorders. The city’s children can slightly cope up with the lockdown because they cannot play outside 
generally. So, they can slightly balance this situation. But in Bangladesh, the rural children cannot cope up 


Journal homepage: http://ijece.iaescore.com 


5502. 0O ISSN: 2088-8708 


with this lockdown because they are used to playing outside and have less knowledge about mental 
healthcare. So, in this situation, they are mentally hampered for cannot play, cannot study, family violations, 
and for food and health treatment. Watching TV, drawing, handcrafting, and reading books are possible at 
home. But if someone wants to play in the field that could raise a risk for infection in COVID-19 and in case 
of not acquiring their personal interest it hampers their mental health. Also, the children do not know whether 
their final exam takes place or not. That is why they were confused and tensed about their study. For this 
reason, SciTech Academy has developed an educational portal on COVID-19 for the kids (age 5-12) to drive 
children’s awareness to deal with the pandemic [8] and as well as performance can be measured depending 
on overall academic status and environmental attributes [9]. In this situation, having favorite foods also has a 
big impact on mental health. Some of our respondents say they cannot have their favorite foods right now as 
going to the market is risky for their parents in this pandemic. So, children stay at home for a long period, 
and they are bored. Sometimes they do not have anything to do. In those times they bother their parents. 
Their parents are upbringing to their child, and it causes a bad impact on children’s mental health. It is an 
ancient trend in villages to visit relative’s houses on a regular basis. In this pandemic situation, this is not safe 
to visit or travel anywhere. Primary students are more interested to meet their cousins for gathering and doing 
fun purposes. Besides, hangouts with school friends are missing due to this pandemic. These two facts are 
important for mental health and refreshment. For all the above issues, children are getting depressed and 
leading pressurized lives. The reason behind mental or psychiatric disorders is depression, anxiety, and 
posttraumatic stress [10], because mental health is considered not only by physical factors but also 
socioeconomic and environmental factors [11]. This paper figures out the children’s mental development by 
using machine learning technology with 700 children’s data. By this paper, one can easily find out that 
his/her children are mentally developed or hampered. We have used 5 machine learning algorithms to obtain 
good accuracy, but other algorithms did not give us good accuracy and feedback. 


2. LITERATURE REVIEW 

Srividya et al. [12] in their research work proposes to apply various machine learning (ML) 
algorithms such as support vector machine (SVM), decision tree (DT), naive Bayes classifier, k-nearest 
neighbors (KNN) classifier, random forest (RF), and logistic regression to identify the state of mental health 
in a target group. They got the highest accuracy in RF, which is 90%. Hahn et al. [13] in their paper, have 
proposed the guidelines, provided a conceptual introduction, and fostered widely discussed predictive 
analytical projects in psychiatry, modeling technology, and about all stakeholders. To create this model, they 
employed SVM, KNN, naive Bayes, and logistic regression. To get a satisfying outcome, they employed 
ensemble classification systems. Aghaei et al. [14] have sought, through questionnaires, to predict general 
health based on the characteristics of people’s orientation towards life, quality, and age. Descriptive and 
multivariate regression analyzes have been utilized in the analysis of data. Islam et al. [15] in the papers they 
use DT, KNN, SVM to diagnose depression from social network data. In the DT, they were quite precise. 
The six machine-learning methods for mental disease detection were used for Smets ef al. [16]. They also 
indicated the adequacy of the Bayesian networks and SVM. The prediction for depression and mental disease 
in social media has been examined in Guntuku et al. [17]. They are suggesting here that in many online 
contexts sadness and other mental diseases are foreseeable. The mental health issues in an adult scenario are 
predicted by Tate et al. [18] utilizing engines like logistic regression, eXtreme gradient boosting (XGBoost), 
RF, SVM, and neural network. They have the highest precision of 73% in the support vector. The prevalence 
of psychiatric disorders among the general population is shown by Bijl et al. [19] in their publication, 
findings, and incidence of a Netherlands mental health survey (NEMESIS). They used harsh dichotomies and 
showed that in urban areas 30% more than in rural regions the 12-month prevalence of all mental diseases 
was observed. Copur and Karasu [20] wanted to assess how the COVID-19 pandemic affects the quality of 
life for those over 18 years of age and depressing anxiety, and stress levels. In this study, the data was 
collected using the socio-demographic inquiry form. Due to a concern of the COVID-19 pandemic and home 
detention at this era, the statistically significant link was observed between age, gender, health, concurrent 
chronic and mental illness. Three focus group discussions (FGDs) were conducted by 23 respondents about 
their experiences during the COVID-19 pandemic, and Sifat et al. [21] reported the features. The purpose of 
the following study was to learn more about the COVID-19 lockdown’s effects on health, mental stress, and 
social and economic factors. The goal of Vilca et al. [22] was to assess the impact of the fear of contracting 
COVID-19 on anxiety, depression, and insomnia in 947 university students of both sexes (41.6% males and 
58.4% females) between the ages of 18 and 35, with the findings revealing that a significant percentage of 
university students have significant anxiety (23%), depression (24%), and insomnia symptoms (32.9%). The 
design, importance, and preliminary evaluation of a prototype RA that supports people during the COVID-19 
crisis were given by Islam et al. [23]. Chowdhury et al. [24] described as a challenging problem in image 
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processing, rendering a picture in a new style is a difficult task, and the representation of an image is very 
significant since an image carries a lot of information. Stressors, coping methods, and problems with coping 
techniques were found by Chaudhry et al. [25], which can be used to guide the development of stress 
management apps for frontline health professionals. Given the ongoing epidemic and ongoing healthcare 
challenges, frontline health professionals remain a vulnerable population that needs special attention. 
Shetu et al. [26] collected data from students as an online survey and analyze these to get exact knowledge 
that is essential for their success or failure in academic performance. At the same time, they tried to figure 
out student’s personal behaviors that affect their academic status using data mining techniques. Also, 
Saifuzzaman et al. [27] mentioned in their research the current situation and recent case studies of 
COVID-19 overall Bangladesh while Shetu et al. [28] proposed an effective e-learning framework in 
COVID-19 situation. 


3. METHOD AND ALGORITHM ANALYSIS 

We train our model with supervised machine learning approaches. Machine learning classification is 
a supervised learning approach. The classifier is a machine learning model That is used to discriminate 
different objects based on certain features. Simply we say, by a classifier, we predict an object based on some 
feature objects. In our dataset, we want to predict “mental hamper”. So, we divided our dataset into three 
different clusters, namely “Yes”, “No”, and “Partial” which indicates the mental health condition. Before 
training our model, we split our dataset. The split ratio is 70:30. We have used 70% data for training our 
model and we have left the remaining 30% of data for the model test. Figure 1 shows our work process. To 
solve our problem, we have used different machine learning algorithms. 


Data Collection 


Data Cleaning Data Normalization 


| f 
| | 


Feature Extraction 


Training Data Mental Health Classification 


| 
| | 


Detection Outcome 


Figure 1. Methodological overview of mental health prediction 


3.1. Decision tree 

Decision tree is the most powerful and popular tool among the classifiers for classification and 
prediction. Decision tree classifier functions like a tree structure, where each internal node shows an attribute 
testing, each branch reflects a testing result, and each leaf node has a class label. Choice trees are commonly 
used in operational research, such as analysis of decisions. It helps to determine a rule to achieve a goal. 
Therefore, it is a popular machine learning tool as you can see from the Figure 2. 


3.2. K-nearest neighbors 

K-nearest neighbors is the simplest of all machine learning algorithms. This approach motivates us 
to identify several training samples closest to the new location in distance and to forecast the label. The 
numerical sample number might be a user-defined constant or fluctuate according to the local point density. 
The standard Euclidean distance is the most used method for measuring the distance between two locations. 
KNN has been very successful in several classification and regression tasks and including handwritten 
numbers or satellite processing. 
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3.3. Gaussian naive Bayes 

It is a method based on the theorem of Bayes. The naive Bayes classifier assumes that there is no 
relation between the existences of a certain feature in a class. This model is quick to build and especially 
beneficial for huge datasets. Naive Bayes is well-known for being superior to even advanced classification 
systems with simplicity. 


PCly) = plz) * pR) P) (1) 


Where p(x|y) is posterior probability of class x, p(x) is prior probability of class x, p(y|x) is probability of 
predictor given the class, and P(y) is prior probability of predictor. Gaussian naive Bayes is based on Bayes’ 
theorem and has a strong assumption that predictors should be independent of each other. By assuming a 
Gaussian distribution, naive Bayes can be extended to real-valued attributes. This extension of naive Bayes is 
called Gaussian naive Bayes. 


Figure 2. DT figure taken from dlpng.com 


3.4. Support vector machine 

Support vector machine (SVM) algorithm is a supervised machine learning algorithm. It can solve 
both classification and regression problems. It is also suitable for both linear and nonlinear separable data. 
We can make the result more efficient by using kernel tricks. For both high and small dimensional data 
spaces it works well. SVM usually separate a dataset into different clusters by a hyperplane. The hyperplane 
is the maximum margin divider drawn from the clusters to classify the dataset or dividing an n dimensional 
hypercube an (n-1) dimension hyperplane is drawn with maximum margin. In Figure 3 two-dimensional 
hypercube divided by (2-1=1) dimensional hyperplane. 


Maximum 
Margin Positive 
Hyperplane 


a + ¢ 


Maximum 
Margin 
Hyperplane 


Figure 3. SVM hyperplane figure taken from javatpoint.com 


Kernel, degree, C, and gamma are the parameters of the SVM. We have used these to get an 
efficient result. With each of these changes, the results change a lot. The kernel defines what our hyperplane 
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will look like. Linear, radial basis function (RBF), polynomial, sigmoid these are the type of kernel. The 
degree determines the curvature of the hyperplane. The range of the degree is 0 to 6. The default value of a 
degree is 3. As the value of the degree increases, the curvature of the hyperplane decreases and when the 
degree decreases, the curvature increases. C determines the number of outliers. Mislabeled data are known as 
outliers. If the value of C is high, the number of outliers is low and if C is low, the number of outliers is high. 
It has no specific range, but the default value is 1. We can set any value for C. The gamma parameter 
indicates how far the influence of a single training example data reaches, low value of gamma means ‘far’, 
and high value of gamma mean ‘close’. 


3.5. Random forest 

Random forest (RF) is a widely known classification and regression learning approach. It builds a 
number of DT during the training period. The new case will be sent to every tree to classify the new case. 
Each tree classifies and offers a class of outputs. As the result of the RF, a class is picked by majority vote 
that corresponds to the largest number of identical classes produced by different trees. For novices as for 
pros, RF are straightforward to understand and apply. 


4. DATA ANALYSIS 

In the case of data collection, we made a Google form and collected data from door to door. We 
have maintained social distance due to the COVID-19 pandemic. We were targeting the students of class one 
to class ten and age between five to seventeen years. That means non-adult students were our focus. As they 
stayed at home for more than five months and as a village citizen this experience happened for the first time 
in their life. To understand their mental health situation, we asked them several questions based on 
international criteria of mental health. We simply presented the questions in both English and Bengali 
language that is why they can understand those questions easily. We collected about 700 pieces of 
information from the students. Among all of these students, 54% were boys and the remaining 46% were 
girls. There were 316 children studying in primary school and 384 children studying in high school. We have 
collected this information from villages in 5 districts of Bangladesh as shown in Figure 4. 


4.1. Data processing 

Our dataset contains 700 instances and 19 attributes. Some attributes were irrelevant with our work. 
So, we dropped those columns. After dropping those columns there were remaining 13 attributes such as 
gender, class, acquiring personal interest, trying to go outside, how much hamper study, can concentrate 
study, get favorite food, parents’ relation, and how much miss hangouts with relatives and friends. relevant 
with child mental health and our work. As we made our dataset by ourselves, there was no missing value. So, 
we did not need to handle any missing value. We applied label encoder on our dataset (feature columns). We 
imported its own library and got a result. Then we scale down our dataset with standard scalar. For this, our 
dataset was ready for train and test. To evaluate the impact to the children we calculate the impact column in 
our dataset in a mathematical way and manually. The expected impacts are yes, partial and no. Here ‘Yes’ 
means this child has a serious impact on mental health. In case of ‘Partial’ this child also has an impact on 
mental health but that is mild. And for ‘No’ this child has less or no impact on mental health as shown in 
Figure 5. 


count 
count 


Partial Ws No 
target 
Figure 4. No. of children of individual class Figure 5. Class distribution 
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5. RESULT AND DISCUSSION 

The research result was focused to identify how these pandemic hampers to the students of rural 
areas. We applied classifiers based different machine learning models to identify how much impact the 
students have on their mental health. The result has three criteria of “Yes”, “Partial”, “No”. There were 700 
data for training each of the model. We get various accuracy on different models. Among 5 models the SVM 
and RF performs well with highest accuracy. The students of rural area are the part of targeted population. 
Among all students whose age is under 18, were sent the responses. The labels were denoted as the impact of 
COVID-19 on their mental health. Respondents were asked several questions and they respond it by choosing 
yes, no or partial. Here yes means totally agree, partial means they were little bit agree and no means totally 
disagree. The scores obtained for mentally hampers, partially hampers and no impact at all were 18, 11 and 5 
respectively for primary school students and for high school it was 19, 12 and 5, respectively. The 
decision-making capability of the classifiers was measured by their performance. Accuracy, precision, recall 
and F-score were used to determine the performance of classifiers. For a classifier, the overall accuracy was 
considered as adequate standard. A concept of the successfully identified samples is required in the test set. 

Table 1 gives the accuracy values for the built-in classifiers. It is seen here that the greatest accuracy 
of the RF is 0.9241 and the KNN, and SVM delivers an accuracy of approximately 0.91. Therefore, the 
additional performance indicators were needed to generate the appropriate classification for our dataset. 
There were 13 feature columns. The choice of answers given in yes, no, and partial. And then we modify the 
dataset and convert these answers into 2, 0, and 1 by applying label encoding. The scores for each respondent 
were computed by dividing the respondent into primary and high school level. For primary school students 
the score of (18-12) was graded as their impact on mental health “Yes”, (11-6) as “Partial” and (5-0) as “No”. 
Similarly, for high school student the score of (19-13) as “Yes”, (12-5) as “Partial” and (5-0) as “No”. 


Table 1. Accuracy of classifiers 


Classifier Accuracy 
Decision Tree 88.15% 
K-Nearest Neighbors 91.46% 
Gaussian naive Bayes 88.62% 
Support Vector Machine 91.94% 
Random Forest 92.41% 


The precision is used to assess the class agreement between the data label and the positive labels 
supplied by the classifier. For each of the three class marks, we must compute the precision results since it is 
directly important for class marks. Table 2 shows the data for every classifier together with the three labels 
we utilized in this research. We were aimed to target students who are facing impact on their mental health. 
We can see that the classifier Gaussian naive Bayes give a score of 0.99 for the students who have much 
impact on their mental health. SVM also give remarkable score of 0.97. To identify the data sample of having 
impact on mental health, having score really close to 1 was considered. 

Class label identifier recall is known as measurement sensitivity, which reflects the classification 
efficiency. We also concentrated on getting the yes class label near to 1 point. Table 3 shows the reminder 
scores for labels in three classes and classifiers. For the mentally impaired (yes), the recall score for KNN, 
DT, and RF were 0.95, 0.94, and 0.91 correspondingly. 


Table 2. Precision scores of each classifier 


Classifier No Partial Yes 
Decision tree 0.60 0.90 0.85 
K-nearest neighbors 0.62 0.91 0.95 
Gaussian naive Bayes 0.00 0.83 0.99 
Support vector machine 0.88 0.89 0.97 
Random forest 1.00 0.90 0.95 


Table 3. Recall scores of each classifier 


Classifier No Partial Yes 
Decision tree 0.50 0.85 0.94 
K-nearest neighbors 0.42 0.94 0.95 
Gaussian naive Bayes 0.00 0.99 0.87 
Support vector machine 0.58 0.97 0.89 
Random forest 0.58 0.97 0.91 
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The association between positive and classified labels may be established via F-score. We may use 
the harmonic mean of precision to calculate it and remember all three labels across all classifiers. The result 
close to 1 for the mentally impaired (yes) class was considered while determining the optimal model in the 
classification. Table 4 displays the F-scores for the class labels. The KNN, SVM, and RF classifiers have the 
best choice to classify our dataset according. 


Table 4. Fl-scores of each classifier 


Classifier No Partial Yes 
Decision Tree 0.55 0.87 0.89 
K-Nearest Neighbors 0.50 0.92 0.95 
Gaussian naive Bayes 0.00 0.91 0.92 
Support Vector Machine 0.70 0.93 0.93 
Random Forest 0.74 0.93 0.93 


Our objective is to predict the mentally hampered individuals with higher precision which was 
achieved by Gaussian naive Bayes and SVM. With remarkable accuracy RF, SVM, and KNN perform well 
among the classifiers. In the tables, RF, SVM, and KNN are clearly shown to have the greatest performances 
as separate classification systems. Since we have a lot more succinct dataset and because the labels are 
weakly known SVM works nicely. And since there are fewer sizes or characteristics, KNN are working 
nicely. The assumption of class dependency works for an enormous dataset alone, which is why here, the 
decision tree does not perform well. To prevent overfitting and solidity, a powerful association with the 
fitting nut is necessary, which is not exceptional. As it does not rust and will not generalize well in the future, 
the decision tree is not working very well. The total comparison of performance is provided in 
Figures 6 and 7. 
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Figure 6. Overall comparison of accuracy and F-score across classifiers 
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Figure 7. Precision, recall and F-score for mentally hampered (YES) class of people 
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6. CONCLUSION 

In this paper, we focused on a very dangerous problem That is mental health problems of rural 
children’s during the lockdown because of COVID-19 pandemic. We have built a methodology to estimate a 
rural youngster in certain districts’ mental health. We employed clustering methods to identify the number of 
clusters. The labels acquired by employing MOS have been verified and supplied as input for the training of 
the classification system. Here, SVM and KNN have performed almost equivalents. And the DT and 
Gaussian naive Bayes have the equivalent accuracy. The highest accuracy obtained by random forest is 
92.41%. However, even SVM and KNN has a good accuracy. If the dataset is larger and can be more 
accurate, then random forest’s time for running will increase. And, if the data can be collected from all 
districts, we can measure the mental health development for all children in Bangladesh. 
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